Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

نویسندگان

  • Carlos Ordonez
  • Yiqun Zhang
چکیده

We study the serial and parallel computation of Γ (Gamma), a comprehensive data summarization matrix for linear Gaussian models, widely used in big data analytics. Computing Gamma can be reduced to a single matrix multiplication with the data set, where such multiplication can be evaluated as a sum of vector outer products, which enables incremental and parallel computation, essential features for scalable computation. By exploiting Gamma, iterative algorithms are changed to work in two phases: (1) Incremental-parallel data set summarization (i.e. in one scan and distributive); (2) Iteration in main memory exploiting the summarization matrix in intermediate matrix computations (i.e. reducing number of scans). Most intermediate computations on large matrices collapse to computations based on Gamma, a much smaller matrix. We present specialized database algorithms for dense and sparse matrices, respectively. Assuming a distributed memory model (i.e. shared-nothing) and a larger number of points than processing nodes, we show Gamma parallel computation has close to linear speedup. We explain how to compute Gamma with existing database systems processing mechanisms and their impact on time complexity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Computation of Linear Machine Learning Models in Parallel Database Systems

We study the serial and parallel computation of Γ (Gamma), a comprehensive data summarization matrix for linear machine learning models widely used in big data analytics. We prove that computing Gamma can be reduced to a single matrix multiplication with the data set, where such multiplication can be evaluated as a sum of vector outer products, which enables incremental and parallel computation...

متن کامل

The Gamma Operator for Big Data Summarization on an Array DBMS

SciDB is a parallel array DBMS that provides multidimensional arrays, a query language and basic ACID properties. In this paper, we introduce a summarization matrix operator that computes sufficient statistics in one pass and in parallel on an array DBMS. Such sufficient statistics benefit a big family of statistical and machine learning models, including PCA, linear regression and variable sel...

متن کامل

Using the Matrix Method to Compute the Degrees of Freedom of Mechanisms

In this paper, some definitions and traditional formulas for calculating the mobility of mechanisms are represented, e.g. Grubler formula, Somov - Malyshev formula, and Buchsbaum - Freudenstei. It is discussed that there are certain cases in which they are too ambiguous and incorrect to use. However, a matrix method is suggested based on the rank of the Jacobian of the mechanism and its applica...

متن کامل

Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method

In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...

متن کامل

A Parallel Algorithm with Embedded Load Balancing for Autocorrelation Matrix Computation

The computation of autocorrelation matrix is used heavily in several areas including signal and image processing , where parallel and application-speciic archi-tectures are also being increasingly used. Therefore, an eecient scheme to compute autocorrelation matrix on parallel architectures has tremendous beneets. In this paper, a parallel algorithm for the computation of autocorrelation matrix...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016